pig feed1 feed2 feed3 feed4
1 60.8 68.7 92.6 87.9
2 57.0 67.7 92.1 84.2
3 65.0 74.0 90.2 83.1
4 58.6 66.3 96.5 85.7
5 61.7 69.8 99.1 90.3
feed.feed saying which feed that weight goes with.aov.pivot_longer:pigs2 is now in “long” format, ready for analysis. See next page.pivot_longer:
pig identify pigs within each group: pig 1 is four different pigs! Df Sum Sq Mean Sq F value Pr(>F)
feed 3 3521 1173.5 119.1 3.72e-11 ***
Residuals 16 158 9.8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = weight ~ feed, data = pigs2)
$feed
diff lwr upr p adj
feed2-feed1 8.68 3.001038 14.358962 0.0024000
feed3-feed1 33.48 27.801038 39.158962 0.0000000
feed4-feed1 25.62 19.941038 31.298962 0.0000000
feed3-feed2 24.80 19.121038 30.478962 0.0000000
feed4-feed2 16.94 11.261038 22.618962 0.0000013
feed4-feed3 -7.86 -13.538962 -2.181038 0.0055599
All of the feeds differ!
To find the best and worst, get mean weight by feed group. I borrowed an idea from earlier to put the means in descending order:
Feed 3 is best, feed 1 worst.
m1524 is males aged 15–24. Also mu and fu, where age is unknown.genage contains both gender and age group. Split that up using separate.separate needs 3 things:
tb %>%
pivot_longer(m04:fu, names_to = "genage",
values_to = "freq", values_drop_na = T) %>%
separate(genage, c("gender", "age"), 1) China started recording in 1995, which is at least part of the problem:
Daily weather records for “Toronto City” weather station in 2018:
Numbers in data frame all temperatures (for different days of the month), so first step is
element contains names of two different variables, that should each be in separate column.m1524 in tuberculosis data, that contained levels of two different factors, handled by separate.pivot_wider:mutate creates new columns from old (or assign back to change a variable).separate works, or pull out number as below.select keeps columns (or drops, with minus). Station name has no value to us:weather %>%
pivot_longer(d01:d31, names_to="day",
values_to="temperature", values_drop_na = T) %>%
pivot_wider(names_from=element, values_from=temperature) %>%
mutate(Day = parse_number(day)) %>%
select(-station) %>%
unite(datestr, c(Year, Month, Day), sep = "-") %>%
mutate(date = as.Date(datestr)) %>%
select(c(date, tmax, tmin)) -> weather_tidyggplot requires something likeonly we have two temperatures, one a max and one a min, that we want to keep separate.
tmax and tmin together into one column, keeping track of what kind of temp they are. (This actually same format as untidy weather.) Are making weather_tidy untidy for purposes of drawing graph only.to distinguish max and min on graph.
ggplot in a pipeline, the initial data frame is omitted, because it is whatever came out of the previous step.pivot_longer. I save the graph to show overleaf:pivot_longer and pivot_wider are opposites; separate and unite are opposites.
Comments